Diamonds Data

Row

Diamonds Analysis

Price

Diamonds in dataset

53940

Average Price

3932.8

Row

Low Diamonds Price

30334

High Diamonds Price

23606

Row

Color of Diamonds

Color of Diamonds

Clarity of Diamonds

Clarity of Diamonds

Graphs 1

Row

Histogram of Price

Histogram of Price with Cut

Histogram of Weight in Diamonds

Row

Bar graph of Diamonds in Cut

Bar graph of Diamonds in Clarity

Bar graph of Diamonds in Color

Graphs 2

Row

Scatter Plot of Carat vs Price and Clarity

Scatter plot for Carat vs Price and Color

Violin plot of Price against Color with category Clarity

Data Table

Random Forest

Column

Summary

  • I did a temporary forest using 4 mtry with 1000 trees.
  • Decided on 3 mtry after running a for loop to find the optimal length

Row

Row

Row

Model

Column


Call:
glm(formula = price ~ clarity + color + cut + depth, family = inverse.gaussian(link = "log"), 
    data = diamonds)

Deviance Residuals: 
      Min         1Q     Median         3Q        Max  
-0.051828  -0.023033  -0.007450   0.004268   0.058727  

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)   8.548446   0.232441  36.777  < 2e-16 ***
claritySI2    0.265599   0.042885   6.193 5.94e-10 ***
claritySI1    0.031997   0.042118   0.760   0.4474    
clarityVS2   -0.003939   0.042213  -0.093   0.9257    
clarityVS1   -0.017628   0.042783  -0.412   0.6803    
clarityVVS2  -0.090097   0.043568  -2.068   0.0386 *  
clarityVVS1  -0.342667   0.043812  -7.821 5.32e-15 ***
clarityIF    -0.207875   0.047203  -4.404 1.07e-05 ***
colorE       -0.038185   0.015346  -2.488   0.0128 *  
colorF        0.182064   0.016178  11.253  < 2e-16 ***
colorG        0.272189   0.015965  17.049  < 2e-16 ***
colorH        0.292381   0.017251  16.949  < 2e-16 ***
colorI        0.426022   0.020272  21.015  < 2e-16 ***
colorJ        0.489487   0.026670  18.354  < 2e-16 ***
cutGood      -0.069376   0.033264  -2.086   0.0370 *  
cutVery Good -0.044312   0.031616  -1.402   0.1611    
cutPremium    0.049652   0.032058   1.549   0.1214    
cutIdeal     -0.168951   0.031003  -5.450 5.07e-08 ***
depth        -0.007137   0.003540  -2.016   0.0438 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for inverse.gaussian family taken to be 0.0003049347)

    Null deviance: 21.658  on 53939  degrees of freedom
Residual deviance: 20.743  on 53921  degrees of freedom
AIC: 989018

Number of Fisher Scoring iterations: 10

Row

Row

Analysis of Deviance Table

Model: inverse.gaussian, link: log

Response: price

Terms added sequentially (first to last)


        Df Deviance Resid. Df Resid. Dev
NULL                    53939     21.658
clarity  7  0.42625     53932     21.232
color    6  0.37550     53926     20.856
cut      4  0.11166     53922     20.745
depth    1  0.00131     53921     20.743

Column

[1] 0.9999995

Column

 (Intercept)   claritySI2   claritySI1   clarityVS2   clarityVS1  clarityVVS2 
5158.7299923    1.3042114    1.0325144    0.9960687    0.9825261    0.9138429 
 clarityVVS1    clarityIF       colorE       colorF       colorG       colorH 
   0.7098747    0.8123089    0.9625351    1.1996904    1.3128347    1.3396138 
      colorI       colorJ      cutGood cutVery Good   cutPremium     cutIdeal 
   1.5311551    1.6314789    0.9329760    0.9566550    1.0509053    0.8445502 
       depth 
   0.9928889 

Column

[1] 1